Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix ChainedVector constructor performance #103

Merged
merged 1 commit into from
Jul 7, 2024

Conversation

JoaoAparicio
Copy link
Contributor

Currently the ChainedVector constructor has bad performance if the args have empty entries.

import Random
struct MyType
    x::Float64
    y::Float64
end

x1 = [ MyType[MyType(rand(), rand()) for _ in 1:Random.rand(1:6)] for _ in 1:1_000_000];
@time ChainedVector(x1);  # 0.017 secs

x2 = [ MyType[MyType(rand(), rand()) for _ in 1:Random.rand(0:5)] for _ in 1:1_000_000];
@time ChainedVector(x2);  # 24 secs

Note that the difference comes only from the fact that x2 might have empty entries whereas x1 doesn't.

The problem is that the function cleanup! has runtime proportional to the number of elements times the number of elements which are empty vectors.

function cleanup_original!(arrays, inds)
    @assert length(arrays) == length(inds)
    for i = length(arrays):-1:1
        if !isassigned(arrays, i) || length(arrays[i]) == 0
            deleteat!(arrays, i)
            deleteat!(inds, i)
        end
    end
    return
end

function cleanup_new!(arrays, inds)
    @assert length(arrays) == length(inds)
    mask_it = Base.Iterators.filter(i->isassigned(arrays, i) && length(arrays[i])>0, 1:length(arrays))
    n = 0
    for k in mask_it
        n += 1
        k > n || continue
        arrays[n] = arrays[k]
        inds[n] = inds[k]
    end
    resize!(arrays, n)
    resize!(inds, n)
    return
end

We can check that the new version is faster and returns the same result

x = [ MyType[MyType(rand(), rand()) for _ in 1:Random.rand(0:5)] for _ in 1:1_000_000];
arrays = x;
inds = Vector{Int}(undef, length(arrays));

x1 = copy(x);
inds1 = copy(inds);
x2 = copy(x);
inds2 = copy(inds);

@time cleanup_original!(x1, inds1)  # 20 secs
@time cleanup_new!(x2, inds2)  # 0.04 secs

all(x1 .== x2)  # true
all(inds1 .== inds2)  # true

Currently the ChainedVector constructor has bad performance if the args have empty entries.

```
import Random
struct MyType
    x::Float64
    y::Float64
end

x1 = [ MyType[MyType(rand(), rand()) for _ in 1:Random.rand(1:6)] for _ in 1:1_000_000];
@time ChainedVector(x1);  # 0.017 secs

x2 = [ MyType[MyType(rand(), rand()) for _ in 1:Random.rand(0:5)] for _ in 1:1_000_000];
@time ChainedVector(x2);  # 24 secs
```

Note that the difference comes only from the fact that x2 might have empty entries
whereas x1 doesn't.

The problem is that the function cleanup! has runtime proportional to the number of
elements times the number of elements which are empty vectors.

```
function cleanup_original!(arrays, inds)
    @Assert length(arrays) == length(inds)
    for i = length(arrays):-1:1
        if !isassigned(arrays, i) || length(arrays[i]) == 0
            deleteat!(arrays, i)
            deleteat!(inds, i)
        end
    end
    return
end

function cleanup_new!(arrays, inds)
    @Assert length(arrays) == length(inds)
    mask_it = Base.Iterators.filter(i->isassigned(arrays, i) && length(arrays[i])>0, 1:length(arrays))
    n = 0
    for k in mask_it
        n += 1
        k > n || continue
        arrays[n] = arrays[k]
        inds[n] = inds[k]
    end
    resize!(arrays, n)
    resize!(inds, n)
    return
end
```

We can check that the new version is faster and returns the same result

```
x = [ MyType[MyType(rand(), rand()) for _ in 1:Random.rand(0:5)] for _ in 1:1_000_000];
arrays = x;
inds = Vector{Int}(undef, length(arrays));

x1 = copy(x);
inds1 = copy(inds);
x2 = copy(x);
inds2 = copy(inds);

@time cleanup_original!(x1, inds1)  # 20 secs
@time cleanup_new!(x2, inds2)  # 0.04 secs

all(x1 .== x2)  # true
all(inds1 .== inds2)  # true
```
Copy link
Member

@quinnj quinnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed PR description; this is a great improvement!

@quinnj quinnj merged commit dda9a2c into JuliaData:main Jul 7, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants